Skip to content

External models (Gemini Nano Banana & OpenAI GPT Image) (#8633)#8884

Merged
lstein merged 63 commits intoinvoke-ai:mainfrom
CypherNaught-0x:external-models
Apr 20, 2026
Merged

External models (Gemini Nano Banana & OpenAI GPT Image) (#8633)#8884
lstein merged 63 commits intoinvoke-ai:mainfrom
CypherNaught-0x:external-models

Conversation

@CypherNaught-0x
Copy link
Copy Markdown
Contributor

Summary

This PR adds support for external model provider APIs with Google and OpenAI added for now.
It supports txt2img, img2img and image references.
I tried to make it fit well within the application and be easily extensible for future models.

Related Issues / Discussions

#8633 includes functionality requested here

QA Instructions

Select an external provider in the model setup dialog and add an API key.
Select a new model from the dropdown list.
...
Profit

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions Bot added api python PRs that change python files invocations PRs that change invocations backend PRs that change backend files services PRs that change app services frontend PRs that change frontend files python-tests PRs that change python tests docs PRs that change docs labels Feb 17, 2026
@Pfannkuchensack Pfannkuchensack self-assigned this Feb 19, 2026
@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

I did some testing. Works fine (only did gemini)

A few comments

  • Reidentify button: The "Reidentify" button in the Model Manager should not be shown for external models.
  • Auto-install starter models: Auto-install should always be enabled for external starter models. When an API key is removed, the associated external models should also be removed.
  • Install queue status: The install queue shows "Unknown" when installing external models. This needs to display the correct model name/status.
  • Starter model description: The text for external starter models needs to clearly indicate that an API key is required and that usage may incur costs. (And the Starter Models are not needed if the Autoinstall is always on)
  • Canvas settings for external models: It is currently unclear which canvas settings are actually passed to external models. Right now all standard settings (Scheduler, Steps, CFG Scale, and everything under Advanced) are displayed, but most of these are not used by external models. We need a solution where external models can define their required settings as JSON, and the frontend renders only the relevant controls based on that definition.
  • External Image Generation node: The "External Image Generation" node also contains these values. The core issue is that we cannot have dynamic nodes. Instead, we should have a dedicated settings node for each external model node.

@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

Some Changes that should be done:

In invokeai/app/services/external_generation/external_generation_default.py, the method _refresh_model_capabilities does:

from invokeai.app.api.dependencies import ApiDependencies
record = ApiDependencies.invoker.services.model_manager.store.get_model(request.model.key)

No other service in the codebase imports from invokeai.app.api.dependencies. All other services receive their dependencies via constructor injection through InvocationServices. This is an architectural violation that makes the service harder to test in isolation and creates a hidden coupling between the service and API layers.


In invokeai/app/services/model_install/model_install_common.py:

MODEL_SOURCE_TO_TYPE_MAP = {
    ...
    ExternalModelSource: ModelSourceType.Url,
}

ExternalModelSource is not a URL source. There is no ModelSourceType.External enum value in taxonomy.py. This means external models get recorded as Url-type sources in the database, which is semantically incorrect and could cause issues in any code that branches on source_type.


In invokeai/app/api/routers/app_info.py:

for config in (runtime_config, file_config):
    config.update_config(updates)
    for field_name, value in updates.items():
        if value is None:
            config.model_fields_set.discard(field_name)

This directly mutates the model_fields_set of the global singleton InvokeAIAppConfig, bypassing Pydantic's field-tracking internals. Concurrent requests to set_external_provider_config or reset_external_provider_config could race on this shared mutable set.


In invokeai/app/services/model_install/model_install_default.py, _register_external_model generates a deterministic key via slugify(f"{provider_id}-{provider_model_id}"). Installing the same external model twice produces the same key. While the DB layer catches this with DuplicateModelException, there is no proactive check or update-if-exists logic, resulting in an unhelpful error for the user.


In invokeai/app/api/routers/model_manager.py, list_model_records uses setattr(model, "capabilities", ...) and setattr(model, "default_settings", ...) on Pydantic model instances. Pydantic v2 models may not support direct attribute mutation without validate_assignment = True. The PR itself uses model_copy(update=...) correctly in other places (e.g., _apply_starter_overrides in external_generation_default.py), so this is inconsistent.


I think to save the API keys in the invoke.yaml is not the best choice here. This is something that the development team in https://discord.com/channels/1020123559063990373/1049495067846524939 should discuss.

@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Feb 23, 2026

@CypherNaught-0x Please see the changes requested from @Pfannkuchensack above.

@CypherNaught-0x
Copy link
Copy Markdown
Contributor Author

Thanks @Pfannkuchensack for the valuable feedback!

I didn't feel comfortable disabling all the inputs as I didn't see this done elsewhere but you are right of course in that it makes no sense to show for example CFG when that property is not used.
For the other details I went with the interface of model_supports_x. We could do this for the advanced properties as well.
From a UX standpoint I am wondering if it's better to disable these properties or hide them completely.
Do you have any preference or insight on how this is handled elsewhere?

This was more of a first draft since I wasn't sure how such a large addition would be received so I haven't yet spent much time polishing things like the install queue. I was positively surprised with the feedback so I'll try and get things to a more polished state for the next review round.
I also haven't really done extensive testing on the OpenAI implementation so I will get that done also. Glad things worked for Gemini on your end though. I tested on different systems with fresh installs but it's nice to have external confirmation.

How are the discussions on the API Key storage coming along? I saw that the model marketplace can store API keys there as well so figured with a decently restricted API key this might be ok though I'd obviously also prefer at least non-plain-text storage.

@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

Pfannkuchensack@3c83692 i did some work on the hiding of unneeded things in the ui. Maybe take a Look Or copy the whole Thing from there.

@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

The API keys require a separate YAML file. This is better because it allows the API key to be kept separate.

@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Feb 24, 2026

The API keys require a separate YAML file. This is better because it allows the API key to be kept separate.

We need a unified place to stash user's security tokens and API keys. I just now proposed a "Token Manager" in Issue #8904 . Temporarily, you could add nano_banana_key and gpt_image_key to the InvokeAIAppConfig class in invokeai.app.services.config.config_default and stash the keys in invokeai.yaml .

@Pfannkuchensack Does this seem like a reasonable interim solution to API key storage or would it be better to have a completely separate API keys file, like ~/invokeai/api_keys.yaml?

@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

I would prefer the separate file, especially since there will be another solution later, thus avoiding major changes to the invoke.yaml file.

@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Feb 27, 2026

@CypherNaught-0x I'm wondering what you see as the timetable for this? I'm thinking we'll be ready for a 6.12 release in the second week of March. Would that be targetable, or later? The release after that will likely be mid April.

@CypherNaught-0x
Copy link
Copy Markdown
Contributor Author

Pfannkuchensack@3c83692 i did some work on the hiding of unneeded things in the ui. Maybe take a Look Or copy the whole Thing from there.

I had already started work on this and your implementation looks very similar so I'll try and integrate them. Also very much looking forward to Seedream support 👍

@CypherNaught-0x
Copy link
Copy Markdown
Contributor Author

CypherNaught-0x commented Feb 27, 2026

@lstein I've had some time to work on it. I'll try and get things into a polished state and push the changes. I believe mid march should be very much reasonable for a release target.

…rnal graph

- Export imageSizeChanged from paramsSlice (required by the new ImageSize
  recall handler).
- Emit the external graph's metadata model entry via zModelIdentifierField
  since ExternalApiModelConfig is not part of the AnyModelConfig union.
@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Apr 17, 2026

Thanks for the fixes. I've done some functional testing with the Gemini models and encountered a few remaining hitches.

  1. Although you can no longer add inpaint mask layers to the canvas, when you create a new canvas it is still initialized with a default (empty) inpaint mask layer. If inpainting masking isn't supported, then there shouldn't be an inpaint mask at all.
  2. The gemini invocation node has fields for Init Image and Mask Image. However, my understanding of the API is that Gemini doesn't support raster-based img2img or image masks. If this is so, these should be removed from the node.
  3. Generating with the gemini node gives me the following error: Invalid JSON payload received. Unknown name \"thinkingConfig\": Cannot find field. This occurs with each of the three starter models.
  4. The OpenAI GPT models don't accept raster images for img2img or image masks, but the DALL-E models do. Perhaps the OpenAI models should be split into two invocation nodes, one with the raster parameters and the other without?

@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

https://ai.google.dev/gemini-api/docs/image-generation?hl=en#2_inpainting_semantic_masking
there is a mask feature.

I take a lot for the rest

@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Apr 18, 2026

I'm still uncertain that inpaint masks are usable with the external models.

Observations on the OpenAI models, using the node editor:

  • Using the OpenAI Image Generation node in either inpaint or img2img mode with any of the three GPT Image models or DALL-E3 results in ExternalProviderCapabilityError: Mode 'img2img'/'inpaint' is not supported.
  • inpaint mode is accepted by DALL-E2, but doesn't seem to result in any change to the init image. I tried with both a bitmap and with an image that used the alpha channel for the mask.
  • DALL-E2 in img2img node with an init image but no mask gives "Invalid input image - format must be in ['RGBA', "LA', 'L'] . These problems may be related to DALL-E using the alpha channel of the image as its mask. However, when I feed an RGBA init image to DALL-E2 in img2img mode, it is still complaining that it needs an RGBA image, which I find confusing.
  • I see no difference between putting an image into init image vs reference images.

Observations on the Gemini models, using the node editor:

  • The node accepts all combinations of img2img, inpaint with or without the init and mask images and does not complain. However, I can't get the mask to have any effect. Looking at the Gemini documentation regarding the mask, the inpaint mask appears to refer to a prompt semantic mask, not a bitmask or transparency channel.
  • I see no difference between putting an image into init image vs reference images.

If image mask-based inpainting isn't work, let's just remove the init image and mask image fields from the nodes.

The docs conflicts should disappear when I merge the new docs PR in.

Pfannkuchensack and others added 3 commits April 18, 2026 22:56
- Remove img2img and inpaint modes from Gemini models (Gemini has no
  bitmap mask or dedicated edit API; image editing works via reference
  images in the UI)
- Fix DALL-E 2 inpainting: convert grayscale mask to RGBA with alpha
  channel transparency (OpenAI expects transparent=edit area) and
  convert init image to RGBA when mask is present
@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Apr 19, 2026

@Pfannkuchensack Thanks for the recent fixes and I'm looking forward to getting this PR finished and merged. It's been a lot of work!

Unfortunately I haven't been able to get inpainting working with the DALL-E2 model (which as far as I can tell is the only model that uses a mask). I assign a black and white bitmap mask, but the entire image gets modified, not just the masked region. Does inpainting work in your hands?

I also have a philosophical question about whether the Gemini invocation node should even show the modes, the init image and the mask image fields. These are not supported by any Gemini model, and showing them as usable UI fields may confuse people.

Similarly, please consider whether the OpenAI invocation node should show these fields, since only the old DALL-E2 model uses them. The others are edit models.

- Remove DALL-E 2 from starter models (deprecated, shutdown May 12 2026)
- Enable img2img for GPT Image 1/1.5/1-mini (supports edits endpoint)
- Set Gemini models to txt2img only (no mask/edit API; editing via
  ref images)
- Hide mode/init_image/mask_image fields on Gemini node (not usable)
- Hide mask_image field on OpenAI node (no model supports inpaint)
@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Apr 19, 2026

The Gemini invocation node looks good and is ready to go.

Major comment
Reading the OpenAI documentation, it looks like the GPT and DALL-E models take an "action" setting of auto, generate or edit but no mode directly corresponding to inpaint or img2img (https://developers.openai.com/api/docs/guides/image-generation). Internally I see OpenAIProvider.generate() calls either the generate or edit API endpoint depending on whether an init image or at least one reference image is provided. In fact, there is no functional difference between the init image and the reference image.

I suggest:

  1. Either hide the OpenAI node's mode parameter entirely , or replace it with an action of [auto, generate, edit] to follow the OpenAI API more closely. I am happy with the easier of the two solutions.
  2. Hide the OpenAI node's init_image field. It doesn't do anything that the ref image field doesn't, and may confuse people who think it will behave as a raster layer.

Minor comment
The OpenAI invocation node no longer accepts an inpaint mask (correct decision), but still has inpaint as one of its modes. However, this is nonfunctional, so maybe it should be removed?

@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Apr 19, 2026

Another thing I have noticed when using either of the external generation nodes. When I have "Use cache" and "Save to Gallery" checked in the external generation node and hit the Invoke button multiple times, I only get an image output to the gallery on the first try. On the subsequent generations, the log indicates that the job is not queued and there is no output. However, the Invoke button continues to spin as if something were happening.

I would have expected to get multiple identical images using the cached values rather than no image.

- Hide OpenAI node's mode and init_image fields: OpenAI's API has no
  img2img/inpaint distinction (the edits endpoint is invoked
  automatically when reference images are provided). init_image is
  functionally identical to a reference image and was misleading users.
- Default use_cache to False for external image generation nodes:
  external API calls are non-deterministic and incur usage costs.
  Cache hits returned stale image references that did not produce new
  gallery entries on repeat invokes.
External image generation nodes use the standard invocation cache, but
returning the cached output (with stale image_name references) on cache
hits resulted in no new gallery entries — the Invoke button would spin
indefinitely on repeat invokes with identical parameters.

Override invoke_internal so that on cache hit, the cached images are
loaded and re-saved as new gallery entries. The expensive API call is
still skipped (cost saving), but the user sees a new image as expected.
@lstein
Copy link
Copy Markdown
Collaborator

lstein commented Apr 19, 2026

Almost there! I found just one more thing that I missed on the first go-rounds. In the OpenAI models, the "Remix" recall function is not restoring the advanced Quality, Background or Input Fidelity settings.

Pfannkuchensack and others added 2 commits April 19, 2026 22:03
Remix recall iterates through ImageMetadataHandlers but only Gemini's
temperature handler was wired up — OpenAI's quality, background, and
input_fidelity were stored in image metadata but never parsed back into
the params slice. Add the three missing handlers so Remix restores
these settings as expected.
@lstein lstein self-requested a review April 20, 2026 17:02
Copy link
Copy Markdown
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

@lstein lstein enabled auto-merge (squash) April 20, 2026 17:03
@lstein lstein merged commit 9deb545 into invoke-ai:main Apr 20, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api backend PRs that change backend files docs PRs that change docs frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-tests PRs that change python tests Root services PRs that change app services v6.13.x

Projects

Status: 6.13.x Theme: MODELS

Development

Successfully merging this pull request may close these issues.

3 participants